Spring 2018

Spatial Analysis

Spatial Analysis

Spatial Analysis begins with an exploration of the data.

  1. Mapping to see its location and distribution

  2. Asking questions of, or querying, your data.

  3. Cleaning & reshaping the data

  4. Applying analysis methods

  5. Mapping analysis results

  6. Repeat as needed

Spatial Queries

There are two key types of spatial queries

  • spatial measurement queries,
    • e.g. area, length, distance
  • spatial relationship queries,
    • e.g. what locations in A are also in B.

These types are often combined, e.g.

  • How much of region A that is within region B?

Our Questions

So far we have explored housing values for the city of San Francisco.

The data set consists of a lot of dissaggregated features represented as points.

In this section we will

  • determine the census tract id (GEOID) for each property
  • calculate the average property value per neighborhood.
  • determine the average property value within walking dist of coit tower

R Spatial Libraries

Let's load the R libraries we will use

library(sp)
library(rgdal)
library(rgeos)
library(tmap)
library(RColorBrewer)
library(ggplot2)
library(ggmap)
## leaflet, ggmap, maptools, ??
## Warning: package 'rgdal' was built under R version 3.4.3
## rgdal: version: 1.2-16, (SVN revision 701)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
##  Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/gdal
##  GDAL binary built with GEOS: FALSE 
##  Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
##  Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/proj
##  Linking to sp version: 1.2-5
## Warning: package 'rgeos' was built under R version 3.4.2
## rgeos version: 0.3-26, (SVN revision 560)
##  GEOS runtime version: 3.6.1-CAPI-1.10.1 r0 
##  Linking to sp version: 1.2-5 
##  Polygon checking: TRUE
## Warning: package 'tmap' was built under R version 3.4.3
## Google Maps API Terms of Service: http://developers.google.com/maps/terms.
## Please cite ggmap if you use it: see citation("ggmap") for details.

R Packages for Spatial Analysis & Mapping

sp: Classes and methods for spatial data

rgdal: for importing, exporting and transforming spatial data

rgeos: for spatial operations and queries on geometric objects

tmap: for creating interactive web maps

RColorBrewer: for selecting predefined color palettes.

ggmap/ggplot2 for geocoding locations, mapping and plotting

SF Properties 2015

Load the data if it is not already loaded

# setwd()
sfhomes <- read.csv('data/sf_properties_25ksample.csv')
sfhomes15 <- subset(sfhomes, as.numeric(SalesYear) == 2015)
class(sfhomes15)
## [1] "data.frame"
sfhomes15_sp <- sfhomes15
coordinates(sfhomes15_sp) <- c('lon','lat') # ORDER MATTERS!!
proj4string(sfhomes15_sp) <- CRS("+init=epsg:4326")

SF Census Tracts

Use readOGR to load the shapefile sf_pop_by_tracts

  • it's in the data folder

Then take a look at the data.

  • What type of sp object is it?
  • What is its CRS?
  • What column contains the population data?

SF Census Tracts

sftracts <- readOGR(dsn="./data", layer="sf_pop_by_tracts")
## OGR data source with driver: ESRI Shapefile 
## Source: "./data", layer: "sf_pop_by_tracts"
## with 196 features
## It has 10 fields
class(sftracts)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
proj4string(sftracts)
## [1] "+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0"
summary(sftracts)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
##          min        max
## x -123.01392 -122.32801
## y   37.69274   37.86334
## Is projected: FALSE 
## proj4string :
## [+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0]
## Data attributes:
##  STATEFP  COUNTYFP     TRACTCE                    AFFGEOID  
##  06:196   075:196   010100 :  1   1400000US06075010100:  1  
##                     010200 :  1   1400000US06075010200:  1  
##                     010300 :  1   1400000US06075010300:  1  
##                     010400 :  1   1400000US06075010400:  1  
##                     010500 :  1   1400000US06075010500:  1  
##                     010600 :  1   1400000US06075010600:  1  
##                     (Other):190   (Other)             :190  
##          GEOID          NAME     LSAD         ALAND        
##  06075010100:  1   101    :  1   CT:196   Min.   :  56564  
##  06075010200:  1   102    :  1            1st Qu.: 294618  
##  06075010300:  1   103    :  1            Median : 409830  
##  06075010400:  1   104    :  1            Mean   : 619651  
##  06075010500:  1   105    :  1            3rd Qu.: 660393  
##  06075010600:  1   106    :  1            Max.   :6108837  
##  (Other)    :190   (Other):190                             
##      AWATER              pop14      
##  Min.   :        0   Min.   :    0  
##  1st Qu.:        0   1st Qu.: 3100  
##  Median :        0   Median : 4111  
##  Mean   :  2082654   Mean   : 4230  
##  3rd Qu.:        0   3rd Qu.: 5080  
##  Max.   :247501271   Max.   :12391  
## 
head(sftracts@data)
##   STATEFP COUNTYFP TRACTCE             AFFGEOID       GEOID   NAME LSAD
## 0      06      075  010700 1400000US06075010700 06075010700    107   CT
## 1      06      075  012201 1400000US06075012201 06075012201 122.01   CT
## 2      06      075  013102 1400000US06075013102 06075013102 131.02   CT
## 3      06      075  013500 1400000US06075013500 06075013500    135   CT
## 4      06      075  015500 1400000US06075015500 06075015500    155   CT
## 5      06      075  016300 1400000US06075016300 06075016300    163   CT
##    ALAND AWATER pop14
## 0 183170      0  5311
## 1  92048      0  4576
## 2 237886      0  2692
## 3 235184      0  2592
## 4 311339      0  3918
## 5 245867      0  4748

Make a Quick Plot

plot(sftracts)  # or qtm(sftracts)

Remove the Farallon Islands tract

You can subset a SPDF just like a DF

Select all features with population > 0

Then replot the map

Subset

sftracts<- subset(sftracts, pop14 > 0)
plot(sftracts)

What Tract is each property in?

We need to spatially join the sftracts and sfhomes15_sp to answer this.

Spatial Join

A spatial join associates rows of data in one object with rows in another object based on the spatial relationship between the two objects.

A spatial join is based on the comparison of two sets of geometries in the same coordinate space.

This is called a spatial overlay.

Spatial overlay

Spatial overlay operations in R are implemented using the over function in the SP and rGEOS libraries.

Point-polygon overlay use SP::over

SpatialLines objects, or pairs of SpatialPolygons require package rgeos, and use gIntersects.

That's likely more detail than you need right now but the point here is that rgeos is the go-to library for vector geometry processing in R.

over

You can interperet over(x,y) as:

  • for each feature in X give me information about the first feature in Y at the corresponding location.

You can interperet over(x,y, returnList=TRUE) as:

  • for each feature in X give me information about the all features in Y at the corresponding location.

See ?over for details.

So here goes…

In what tract is each SF property is located?

homes_with_tracts <- over(sfhomes15_sp, sftracts)

Did it work?

If not, why not?

Coordinate reference systems (CRS) must be the same!

CRSs must be the same

The over function, like almost all spatial analysis functions, requires that both data sets be spatial objects (they are) with the same coordinate reference system (CRS). Let's investigate

# What is the CRS of the property data?
proj4string(sfhomes15_sp)

# What is the CRS of the census tracts?
proj4string(sftracts)

Geographic CRS

Both data are in a geographic CRS but they are different - WGS84 (sfhomes15_sp) and NAD83 (sftracts)

  1. Transform the CRS of sftracts to that of sfhomes15_sp

  2. Then test that they are identical

Transform the CRS

# Transform the CRS for tracts to be the same as that for sfhomes15_sp
sftracts2 <- spTransform(sftracts, CRS(proj4string(sfhomes15_sp)))

# make sure the CRSs are the same
proj4string(sftracts2) == proj4string(sfhomes15_sp) 
## [1] TRUE

Now let's try that overlay operation again

Try 2

# Now try the overlay operation again
# Identify the tract for each property in sfhomes15_sp
homes_with_tracts <- over(sfhomes15_sp, sftracts2)

Questions

What is our output? Does it answer our question?

What type of data object did the over function return?

head(homes_with_tracts) # take a look at the output
class(homes_with_tracts)
nrow(homes_with_tracts)
nrow(sftracts2)
nrow(sfhomes15_sp)

over discussion

Our output homes_with_tracts is a data frame that contains - the id of each property in sfhomes15_sp - all of the columns from sftracts2@data including the census tract id (GEOID)

So we are close to answering our question.

But for the data to be useful we need - to link (or join) the GEOID to the sfhomes15_sp object

Keep just the GEOID column

homes_with_tracts <- homes_with_tracts[c("GEOID")]

Add the the DF to the SPDF

We can use the base cbind (column bind) command to join the data frame to the SpatialPointsDataFrame.

The successful use of this function requires the data to be in the same order!

sfhomes15_sp@data <-cbind(sfhomes15_sp@data, homes_with_tracts)
## NOTE - binding to the @data slot!

# Review and note the change
# head(sfhomes15_sp@data)

Check in tmap

Map the data in tmap interactive mode

tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(sftracts2) + tm_polygons(col="beige") + tm_shape(sfhomes15_sp) + 
  tm_dots(col="red")

Evaluation

Did the over command successfully associate the census tract GEIOD with each property?

If yes, you now could link the property data to census demographic data by GEOID.

Point-in-Polygon Queries

We just did what's called a type of spatial overlay query called a Point-in-Polygon query.

We asked "In what tract is each property located?".

WOW

Data linkage via space!

The Census Tract Perspective

We now know the tract for each property.

Now let's think about this question from the tract perspective.

Let's ask the question

  • What is the average propety value per tract?

Non-Spatial Aggregation

Since we joined GEOID to each property we can use the non-spatial aggregate function to compute the mean of totvalues for each GEOID.

mean_totvalue_by_tract <- aggregate(totvalue ~ GEOID, sfhomes15_sp, mean)

# Take a look
head(mean_totvalue_by_tract)
##         GEOID totvalue
## 1 06075010100  1315017
## 2 06075010200  1182728
## 3 06075010300  1739681
## 4 06075010400  1199131
## 5 06075010500  1537299
## 6 06075010600  1657004

Rename the column!

So that it is clear that it is the mean for the tract!

colnames(mean_totvalue_by_tract) <- c("GEOID","mean_totvalue")
head(mean_totvalue_by_tract)
##         GEOID mean_totvalue
## 1 06075010100       1315017
## 2 06075010200       1182728
## 3 06075010300       1739681
## 4 06075010400       1199131
## 5 06075010500       1537299
## 6 06075010600       1657004

Map Tracts by Mean Property Value

However, we can't map our data frame of mean_totvalues by GEOID.

We can use sp::merge to join the mean_totvalue_by_tract DF to the sftracts2 SPDF.

We should make sure that

  • the number of rows in sftracts2 and mean_totvalue_by_tract are the same

  • they share a column of common values - GEOID

Join by Attribute

When we join two data objects based on values in a column it is called a data table join by attribute.

The sp:merge makes this syntax simple for sp objects with @data slots.

sftracts2<- merge(sftracts2, mean_totvalue_by_tract, 
                  by.x="GEOID", by.y="GEOID")

IMPORTANT: DO NOT merge the DF to the @data slot! but rather to the SPDF!

## Don't do this:
 sftracts2@data <- merge(sftracts2@data, mean_totvalue_by_tract,
              by.x="GEOID", by.y="GEOID")

Take a look

head(sftracts2@data)
##          GEOID STATEFP COUNTYFP TRACTCE             AFFGEOID   NAME LSAD
## 7  06075010700      06      075  010700 1400000US06075010700    107   CT
## 20 06075012201      06      075  012201 1400000US06075012201 122.01   CT
## 36 06075013102      06      075  013102 1400000US06075013102 131.02   CT
## 40 06075013500      06      075  013500 1400000US06075013500    135   CT
## 45 06075015500      06      075  015500 1400000US06075015500    155   CT
## 54 06075016300      06      075  016300 1400000US06075016300    163   CT
##     ALAND AWATER pop14 mean_totvalue
## 7  183170      0  5311     1862440.8
## 20  92048      0  4576      614217.4
## 36 237886      0  2692     1286435.1
## 40 235184      0  2592     1493924.9
## 45 311339      0  3918      873522.1
## 54 245867      0  4748     1446370.5

Challenge

Create an interactive tmap of census tracts colored by mean_totvalue

tm_shape(sftracts2) + tm_polygons(col="mean_totvalue", style="jenks")

Aggregating Data Spatially

Above we asked "what is the census tract id for each property?"

We then used the non-spatial aggregate function to calculate the mean totvalue for each tract.

Finally, we did a spatial merge to join this results to the census tracts.

However, we can ask the same from the tract perspective using sp::aggregate

sp::aggregate

Compute mean totvalue by census tract

  • what class is the output object?
tracts_with_property_count <- aggregate(x = sfhomes15_sp["totvalue"], 
                                        by = sftracts2, FUN = mean)

Examine output

class(tracts_with_property_count)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
head(tracts_with_property_count@data)
##    totvalue
## 0 1862440.8
## 1  614217.4
## 2 1286435.1
## 3 1493924.9
## 4  873522.1
## 5 1446370.5
plot(tracts_with_property_count)

Add the GEOID

tracts_with_property_count$GEOID <- sftracts2$GEOID

Check it

Did both methods get the same result?

Check the mean totvalue for the same GEOID in each SPDF

tracts_with_property_count[tracts_with_property_count$GEOID == "06075010700", ]$totvalue 
## [1] 1862441
sftracts2[sftracts2$GEOID == "06075010700",]$mean_totvalue
## [1] 1862441

Distance queries

Distance queries

Many methods of spatial analysis are based on distance queries.

For example, point pattern analysis considers the distance between features to determine whether or not they are clustered.

We can also use distance as a way to select features spatially.

Distance

Let's compute the mean property value within 1 km of Coit Tower

First, we need to know where Coit Tower is. How can we find that out?

Geocoding a place name with ggmap

library(ggmap)
library(ggplot2)

coit_tower <- geocode('Coit Tower, San Francisco, CA')
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Coit%20Tower%2C%20San%20Francisco%2C%20CA
coit_tower_pt <- c(coit_tower$lon, coit_tower$lat) # -122.40582, 37.80239
coit_tower_pt
## [1] -122.40582   37.80239

Selecting by Distance

In order to select properties with 1KM of Coit Tower we - create a 1KM radius buffer polygon around the Coit Tower point

We then do a point-in-polygonn operation to either count the number of properties within the buffer or compute the mean totvalue.

rgeos

rgeos is another powerful and widely used library for working with geospatial data.

It is the muscle for - creating new geometries from exisiting ones - calculating spatial metrics like area, length, distance - calculating the spatial relationship between two geometries.

We can use the rgeos::gBuffer function to create our buffer polygon

Can you understand this code?

coordinates(coit_tower) <- c("lon","lat")
proj4string(coit_tower)<- CRS("+init=epsg:4326")

coit_tower_utm <- spTransform(coit_tower, CRS("+init=epsg:26910"))

library(rgeos)
coit_km_buffer <- gBuffer(coit_tower_utm, width=1000)

Map the Buffer

tm_shape(sfhomes15_sp) + tm_dots(col="blue") +
tm_shape(coit_km_buffer) + tm_borders(col="red", lwd=2) +
tm_shape(coit_tower_utm) + tm_dots(col="red")

Question

Now that we have our buffer polygon, how can we compute the mean totvalue of properties within the buffer?

Spatially Aggregate by Buffer

buff_mean <- aggregate(x = sfhomes15_sp["totvalue"], 
                       by = coit_km_buffer, FUN = mean)

CRS….

coit_buffer_lonlat <- spTransform(coit_km_buffer, 
                                  CRS(proj4string(sfhomes15_sp)))

buff_mean <- aggregate(x = sfhomes15_sp["totvalue"], 
                       by = coit_buffer_lonlat, FUN = mean)

Check Results

What is the mean property value within 1KM of Coit Tower in 2015?

View(buff_mean)

Questions?

Summary

That was a whirlwind tour of just some of the methods of spatial analysis.

There was a lot we didn't and can't cover.

Raster data is a another major topic! - but the raster package is the key

Spring semester classes

  • Geog 88: Geography & Data Science
  • Geog 187: Geographic Info Analysis
  • LA221: Quantitative Methods in Environmental Planning
  • CyPlan 204C: GIS for City Planning

Selected References

Output code to script

library(knitr)
purl("r-geospatial-workshop-f2017-pt2.Rmd", output = "scripts/r-geospatial-workshop-f2017-pt2.r", documentation = 1)